Skip to content

client: don't synthesize the default port into the Host header#1318

Open
krynju wants to merge 4 commits into
JuliaWeb:masterfrom
krynju:kr/host-header-no-default-port
Open

client: don't synthesize the default port into the Host header#1318
krynju wants to merge 4 commits into
JuliaWeb:masterfrom
krynju:kr/host-header-no-default-port

Conversation

@krynju

@krynju krynju commented Jun 23, 2026

Copy link
Copy Markdown

Found this issue when porting to HTTP.jl 2 and AWS S3 presigned links were not working as is.

The problem

For a URL without an explicit port, the client puts the scheme's default
port into the Host header:

HTTP.get("https://example.com/")
# sends:  Host: example.com:443

The Host header should mirror the URL's authority as written. A bare-host
URL should send a bare Host: example.com. HTTP.jl 1.x did this; 2.0 regressed
because request.host is now built from the connection address
(example.com:443), which always carries a port for dialing.

Why it matters

Host: example.com:443 is technically legal (RFC 9110 §7.2 allows a port), but
many servers compare the Host value verbatim and a synthesized default port
breaks them. The case that bit us is AWS S3 presigned URLs (SigV4):

  • The signature is computed over the canonical host s3.amazonaws.com.
  • HTTP.jl 2.0 sends Host: s3.amazonaws.com:443.
  • S3 recomputes the signature from the received Host, gets a different value,
    and rejects the request: 403 SignatureDoesNotMatch.

The same request succeeds with curl and with HTTP.jl 1.x, which send a bare
Host.

What other clients do

Go's net/http keeps the Host header equal to the URL authority and adds the
default port only to the dial address, never to Host:

req, _ := http.NewRequest("PUT", "https://s3.amazonaws.com/b/k", body)
httputil.DumpRequestOut(req, false)
//   PUT /b/k HTTP/1.1
//   Host: s3.amazonaws.com            <- bare; :443 not added

req2, _ := http.NewRequest("PUT", "https://s3.amazonaws.com:443/b/k", body)
//   Host: s3.amazonaws.com:443        <- explicit port preserved

curl and HTTP.jl 1.x behave the same way. HTTP.jl 2.0 is the outlier.

The fix

Add a host_header view on the parsed URL that mirrors the authority as
written — an explicit port is preserved, the default port is never synthesized
(and IPv6 brackets are kept) — and build request.host from it. The connection
address is unchanged, so dialing still targets the correct port.

URL dial address (unchanged) Host header (before → after)
https://example.com/ example.com:443 example.com:443example.com
https://example.com:443/ example.com:443 example.com:443example.com:443
http://minio:9000/ minio:9000 minio:9000minio:9000
https://[2001:db8::1]/ [2001:db8::1]:443 [2001:db8::1]:443[2001:db8::1]

Unit tests added in test/http_client_tests.jl covering bare host, explicit
default port, http default port, a custom port, and IPv6 (bracketed) forms.

Verified end-to-end: a real S3 presigned PUT that returns 403 on current
master returns 200 with this change.

For a URL without an explicit port, the client built the request `Host`
header from the connection address, which always carries the scheme's
default port (`example.com:443`). The Host header should mirror the URL
authority as written, so a bare-host URL must send a bare `Host`.

Sending `Host: host:443` is legal per RFC 9110 but breaks any server that
treats the Host verbatim. Concretely it breaks AWS SigV4 presigned URLs:
the signature is computed over the canonical host (`s3.amazonaws.com`), so
a request whose Host carries `:443` is rejected with SignatureDoesNotMatch
(403). Go's net/http, curl and HTTP.jl 1.x all keep the Host as written.

Add a `host_header` view on the parsed URL that preserves an explicit port
but never synthesizes the default one (keeping IPv6 brackets), and build
`request.host` from it. The connection address is unchanged, so dialing
still targets the right port.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.44444% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 87.71%. Comparing base (a1eb82b) to head (73936f2).

Files with missing lines Patch % Lines
src/http_client_url.jl 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1318      +/-   ##
==========================================
+ Coverage   87.67%   87.71%   +0.04%     
==========================================
  Files          30       30              
  Lines       11739    11747       +8     
==========================================
+ Hits        10292    10304      +12     
+ Misses       1447     1443       -4     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@krynju krynju marked this pull request as ready for review June 23, 2026 14:49
@quinnj

quinnj commented Jun 24, 2026

Copy link
Copy Markdown
Member

Thanks for the focused fix. I think this needs one more pass before merge: the new host_header value is only wired through the high-level request path, so a few sibling client paths still synthesize default ports into Host.

  • HTTP.open / streaming requests still construct the request with parsed.address, so HTTP.open("PUT", "https://example.com/upload") will still send Host: example.com:443.
  • Redirects reset current_request.host = current_address, and WebSocket handshakes use host=parsed.address initially and after redirects, so default-port authorities can still leak there too.

Could you route those paths through the same authority-as-written value and add focused regressions for streaming and redirect/WebSocket coverage?

[findings by codex; reviewed by quinnj]

…ader

The previous commit stopped synthesizing the scheme's default port into the
Host header, but only for the high-level request path. The sibling client
paths still built the request from the connection address, so a default-port
URL leaked Host: host:443 (and broke AWS SigV4) on:

- HTTP.open / streaming requests (built from parsed.address)
- HTTP and WebSocket redirects (reset Host to current_address per hop)
- WebSocket handshakes (initial host=parsed.address)

Wire all of them through the authority-as-written value:

- Streaming and WebSocket init now use parsed.host_header.
- _resolve_redirect_target also returns the next hop's host_header (parsed
  for a host-changing hop, current host carried through for a relative hop),
  and both redirect loops reset request.host from it instead of the address.

The dial address is unchanged everywhere, so connections still target the
right port. Matches Go's net/http, verified empirically: Go sends the URL
authority as written on the initial request and derives the Host from the
next URL on a redirect, never synthesizing the default port.

Adds focused regressions for the streaming constructor and the redirect
resolver (including explicit-default-port preservation), plus a WebSocket
handshake Host assertion.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@krynju

krynju commented Jun 24, 2026

Copy link
Copy Markdown
Author

Thanks for the focused fix. I think this needs one more pass before merge: the new host_header value is only wired through the high-level request path, so a few sibling client paths still synthesize default ports into Host.

  • HTTP.open / streaming requests still construct the request with parsed.address, so HTTP.open("PUT", "https://example.com/upload") will still send Host: example.com:443.
  • Redirects reset current_request.host = current_address, and WebSocket handshakes use host=parsed.address initially and after redirects, so default-port authorities can still leak there too.

Could you route those paths through the same authority-as-written value and add focused regressions for streaming and redirect/WebSocket coverage?

[findings by codex; reviewed by quinnj]

I think this is fully covered now in 5729fdc

@quinnj

quinnj commented Jun 24, 2026

Copy link
Copy Markdown
Member

Thanks for the quick follow-up. The original high-level, streaming, WebSocket, and redirect Host handling gaps look fixed now. I found one remaining low-level redirect regression: _resolve_redirect_target now requires current_host_header::String, but _do_incoming! passes current_request.host directly.

For HTTP.do! callers that provide Host only in headers (request.host === nothing), a redirect now throws MethodError instead of following the redirect. I reproduced this with HTTP.do!(client, addr, HTTP.Request("GET", "/start"; headers=["Host" => addr], ...); redirect_limit=1, protocol=:h1).

[findings by codex; reviewed by quinnj]

krynju and others added 2 commits June 24, 2026 19:35
The streaming and redirect paths got focused host_header regressions, but
the WebSocket coverage only asserted the explicit-port branch. Exercise the
ws->http / wss->https mapping in `_parse_websocket_url` directly: a
default-port wss/ws URL must yield a bare `Host`, while an explicit or custom
port is preserved and the dial address keeps its port.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous commit threaded `current_request.host` into the new
`current_host_header::String` parameter of `_resolve_redirect_target`. But
`Request.host` is `Union{Nothing,String}` and is `nothing` for low-level
`do!` callers that pin `Host` only in the request headers, so a redirect threw
a MethodError instead of being followed.

Loosen the parameter to `Union{Nothing,String}`. On a relative redirect with
no parsed host, fall back to the dial `current_address` (carrying the port, as
before this PR) instead of propagating `nothing` — `_prepare_request_for_redirect`
strips the `Host` header each hop, so returning `nothing` would drop the
`Host` header from the redirected request entirely. Absolute redirects keep
returning the parsed `host_header`.

Adds a unit case for the `nothing` current host and an end-to-end regression:
a `do!` request with `Host` only in headers (`request.host === nothing`)
following a relative redirect now completes and the redirected hop still
carries a valid `Host`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@krynju

krynju commented Jun 24, 2026

Copy link
Copy Markdown
Author

Thanks for the quick follow-up. The original high-level, streaming, WebSocket, and redirect Host handling gaps look fixed now. I found one remaining low-level redirect regression: _resolve_redirect_target now requires current_host_header::String, but _do_incoming! passes current_request.host directly.

For HTTP.do! callers that provide Host only in headers (request.host === nothing), a redirect now throws MethodError instead of following the redirect. I reproduced this with HTTP.do!(client, addr, HTTP.Request("GET", "/start"; headers=["Host" => addr], ...); redirect_limit=1, protocol=:h1).

[findings by codex; reviewed by quinnj]

Nice catch! Fixed in 73936f2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants